Discontinuous Data-Oriented Parsing through Mild Context-Sensitivity

نویسنده

  • Andreas van Cranenburgh
چکیده

It has long been argued that incorporating a notion of discontinuity in phrase-structure is desirable, given phenomena such as topicalization and extraposition, and particular features of languages such as cross-serial dependencies in Dutch and the German Mittelfeld. Up until recently this was mainly a theoretical topic, but advances in parsing technology have made treebank parsing with discontinuous constituents possible, with favorable results. We improve on this by applying Data-Oriented Parsing (dop) to a mildly context-sensitive grammar formalism which allows for discontinuous trees. Decisions during parsing are conditioned on all possible fragments, resulting in improved performance. Despite the fact that both dop and discontinuity present formidable challenges in terms of computational complexity, the model is reasonably efficient. Our results emulate and surpass the state of the art in discontinuous parsing. Acknowledgments: I am grateful to Wolfgang Maier for correspondence and making the code of his parser rparse available, which was an invaluable resource. Federico Sangati offered advice and, at a crucial moment, convinced me the results might be good enough for a paper. Sandra Kübler provided the Tepacoc corpus. Tikitu de Jager prompted me to take typography seriously. Mark Dufour improved his compiler specifically to optimize my parser. Henk Zeevat persuaded me to pursue the master of logic. Rens Bod introduced me to discontinuous constituents in one of his lectures. Finally, I am most grateful for the continuous moral support from my supervisor.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar

Recent advances in parsing technology have made treebank parsing with discontinuous constituents possible, with parser output of competitive quality (Kallmeyer and Maier, 2010). We apply Data-Oriented Parsing (DOP) to a grammar formalism that allows for discontinuous trees (LCFRS). Decisions during parsing are conditioned on all possible fragments, resulting in improved performance. Despite the...

متن کامل

Discontinuous Parsing with an Efficient and Accurate DOP Model

We present a discontinuous variant of treesubstitution grammar (tsg) based on Linear Context-Free Rewriting Systems. We use this formalism to instantiate a Data-Oriented Parsing model applied to discontinuous treebank parsing, and obtain a significant improvement over earlier results for this task. The model induces a tsg from the treebank by extracting fragments that occur at least twice. We g...

متن کامل

Rich Statistical Parsing and Literary Language

This thesisapplies the Data-Oriented Parsing framework in two areas:parsing & literature. The data-oriented approach rests on the assumptionthat re-use of chunks of training data can be detected and exploited attest time. Syntactic tree fragments form the common thread in the thesis.Chapter 2 presents a method to efficiently extract them from treebanks,based on heuristic...

متن کامل

Aspects of Pattern-matching in Data-Oriented Parsing

Data-Oriented Parsing (dop) ranks among the best parsing schemes, pairing state-of-the art parsing accuracy to the psycholinguistic insight that larger chunks of syntactic structures are relevant grammatical and probabilistic units. Parsing with the dop-model, however, seems to involve a lot of CPU cycles and a considerable amount of double work, brought on by the concept of multiple derivation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011